Computation can be done using the single value decomposition of X.
X = U \Sigma V^T
If the data is mean-centered (the default option in sklearn), the sample covariance matrix is given by,
S = \frac{1}{m} X^TX = \frac{1}{m} V\Sigma U^T U \Sigma V^T = V\frac{1}{m}\Sigma^2V^T
which is the eigenvalue decomposition of S, with eigenvectors as the columns of V and corresponding eigenvalues as diagonal entries of \frac{1}{m}\Sigma^2.
The variance explained by p_j is \lambda_{j}. Hence, the total variance is,
\sum_{j=1}^{n} \lambda_j = trace(s)
and the total variance explained by the first k principal components is then,
\frac{\sum_{j=1}^{k} \lambda_j}{trace(s)}
PCA in Action
Code
import numpy as npimport matplotlib.pyplot as pltfrom sklearn.decomposition import PCAfrom sklearn.preprocessing import StandardScaler#generate simulated datanp.random.seed(42)n_samples =200feature1 = np.random.rand(n_samples) feature2 = feature1 *1000X = np.column_stack((feature1, feature2))#pCA without scalingpca_no_scale = PCA(n_components=1)X_pca_no_scale = pca_no_scale.fit_transform(X)scaler = StandardScaler()X_scaled = scaler.fit_transform(X)#pCA with scalingpca_scaled = PCA(n_components=1)X_pca_scaled = pca_scaled.fit_transform(X_scaled)print("First principal component (no scaling):", pca_no_scale.components_)print("First principal component (scaling):", pca_scaled.components_)
First principal component (no scaling): [[0.001 0.9999995]]
First principal component (scaling): [[0.70710678 0.70710678]]
Without Scaling
With Scaling
Summary
PCA is a dimensionality reduction technique that projects data onto directions which explain the most variance in the data.
The principal component directions are eigenvectors of the sample covariance matrix, with corresponding eigenvalues representing the variance explained.
For proper implementation of PCA, data must be mean-centered, sklearn defaut, and scaled.